This Is What a Data Scientist Looks Like

Data Scientists: Who are they? What do they know? Do they know things? Let’s find out!

Lida Zacharopoulou
Towards Data Science

--

Photo by ThisisEngineering RAEng on Unsplash

Back in 2012, Harvard Business Review suggested that data scientist is the “sexiest job” of the 21st century. The hype has been growing ever since. But what is the background of a data scientist? If you are looking for a data scientist job, you have probably come across postings asking from a Ph.D. in Statistics to having years of experience in software engineering tools. So I was wondering, what is the educational background of a data scientist and how is it different from developers in general?

The developer world is still a very male-dominated industry. As a woman in tech and coming from a computer science background, I have been the only female software engineer in a team of about fifteen people in previous jobs. So, besides the educational background, I was curious to see whether data scientists are doing better than other developer types in terms of women representation in the field.

Finally, let’s talk money. Does all the buzz around data scientists translate to more dollars in the bank? Are data scientists actually paid more than other developers and what correlates with a higher salary for a data scientist?

To answer all of these questions, I conducted data analysis on the Stack Overflow Developer Survey of 2019. The dataset is publicly available here.

Let’s see what we found out!

What is the educational background of data scientists and how is it different from developers in general?

We would like to know what is the highest level of education data scientists have completed and whether there are more data scientists coming from a background other than computer science and software engineering than developers in general.

Do more data scientists have degrees higher than a Bachelor’s degree (Master’s or PhDs) compared to other developers?

Highest level of education completed for data scientists vs developers in general

Here, we see a clear difference between data scientists and developers in general. About 60% of data scientists have obtained a Master’s or a doctoral degree, whereas the percentage in all developers is around 30%.

Are there more data scientists coming from a non-CS background than other developers?

Undergraduate major for data scientists vs developers in general

Again, we can see differences between data scientists and other developers, which was expected. Less than half of professional data scientists have completed a computer science-related major, compared to 61% of all developers. About 14% of data scientists have studied math and 13% a natural science, when the equivalent percentages for developers in general are about 4.5%.

Are there any differences between data scientists and developers of any role in the types of non-degree education they have participated in?

Other types of education (online course, bootcamps, etc..) comparison

There are no big differences here between data scientists and developers in general. More than 4 out of 5 professional developers and data scientists have taught themselves a new language, framework, or tool, which was the most popular choice for other types of education for both groups. As far as the second most popular choice, we notice a small difference as 70% of data scientists have taken an online course compared to 57% of developers in general.

Is the gender gap smaller in data scientists than in other roles?

Percentage of gender respondents identified with

We clearly see that there are dramatically more men than women, and the differences between all developers and data scientists are very small. The total number of male professional developers who participated in the survey was 58387, whereas the number of female professional developers was 4669. Only 8% of all professional developers and 8.3% of data scientists identified as women. Therefore, in order to better understand the representation of men and women in different developer roles, we will group the data by developer role.

The ratio of men to women for different developer roles

The graph illustrates the men to women ratio for different developer roles. The blue line shows the average, which is a ratio of 12.5, meaning that there are 12.5 times more men than women in all developer roles.

Although initially, we couldn’t see if the gender gap was smaller in data scientists than in developers in general, when we split by role, we saw that the ratio of female data scientists to male data scientists was 1 in 12, which was one of the best, together with Front-end Developers (11.8) and Data Analysts (12.7). On the other hand, the graph illustrated that in certain roles, the gender gap grows much bigger: DevOps Specialists had the worst ratio (32.1), following by System Administrators (29.7) and Site Reliability Engineers (28.4).

The results show that data scientists had the best ratio of women representation in the workforce among other developers, together with front-end developers and followed by data analysts. Although the ratio was considered good compared to other developers like DevOps Specialists, who are 32 times more likely to men than women, there is still a long way to go in order to bridge the gender gap in the industry.

If it don’t make dollars, it don’t make sense.

Now it’s time to dive into one of the most interesting parts of data available in the dataset and see if data scientists are actually paid more than other developer roles and what correlates with a higher salary. Questions we would like to answer are:

  • Do data scientists get paid more than other roles for the same level of experience?
  • How does the level of education correlate with salary? Does a Ph.D. pay more?
  • Does a software engineering background leads to a higher salary for data scientists compared to other educational backgrounds?
  • Which combination of role titles gets paid more? What company size pays data scientists more?

Salary distribution

Salary distribution for data scientists

Before we start, we took a look at the distribution of salary, which was far from normal. Salary, counted as the Salary converted to annual USD salaries using the exchange rate on 2019–02–01, assuming 12 working months and 50 working weeks had a few extremely high values that were skewing the distribution to the right, therefore I chose to ignore values above a threshold. Since the data is not normally distributed, we will use the median instead of the mean of the salary in the analysis, as it is more representative.

Salary distribution for data scientists after removing outliers

Do data scientists get paid more than other developer roles?

TL;DR Yes.

Median salary to average professional experience by developer role

The annual median salary for all types of developers was $58.5k and for data scientists $66k. For all developers, salaries are higher the more experience they have in coding professionally. Some types of developers, however, are paid more than other roles for the same level of experience.

The graph above illustrates that Data Scientists are one of the highest-paid roles, along with Data Engineers, DevOps Specialists, and Site Reliability Engineers. On the x-axis, we have the average years of professional coding experience and on the y-axis the median global salary in USD per year, while the size of the bubbles corresponds to the number of respondents of each role. Note that I chose to analyze only a subset of the available developer roles across the industry, although more developer roles were available in the data set like executives or educators.

What is particularly interesting is that among developers with the least average professional experience, the data scientist is the top paying role!

This drives us to explore the data even more and see what is the trend of salary over the different years of experience by role.

The median salary for each level of professional experience by developer role

Analyzing the trend of salary over the different years of professional coding experience, we can see that Data Scientists are the second highest paying role across all levels of experience after Site Reliability Engineers, followed by Data Engineers and DevOps Specialists.

What correlates with a higher salary for data scientists?

In particular, we tried to see how the following features correlate with a higher salary for professional data scientists:

  • Level of education
  • Undergraduate major
  • Size of company
  • Combination of role titles
  • Coding as a hobby
  • Contributing to Open Source projects

In order to do so, we used a Linear Regression model and got the top 20 features with the highest coefficients.

Top 20 features that correlate with salary for data scientists

The feature that correlated the most with a higher salary was the country of residence. USA, Switzerland, Israel, Denmark, Canada, Australia, Germany, and the UK, are among the top-paying countries, to no surprise. Other countries of residence, like Russia and India, were negatively correlated with salary.

As far as the educational background, there is a positive correlation between a doctoral degree (Ph.D., Ed.D, etc.) and a high salary for data scientists.

However, when a data scientist role title was combined with that of an academic researcher, we observed a negative correlation for salary.

So, a Ph.D. does pay more…as long as you finish it!

The size of the company also seems to play a role in salary. Working for organizations with more than 10k employees or working as a freelancer was positively correlated to salary.

So, go big… or go home!

Finally, the undergraduate major was not one of the top correlations for salary for Data Scientists, neither was coding as a hobby or the frequency of contributing to open source projects.

Conclusion

To sum up the key takeaways of the analysis:

Data scientists have a different educational background compared to other developer roles.

  • 60% of data scientists have completed a Master’s or a Ph.D. degree compared to 30% of developers in general.
  • 48% of data scientists have completed a CS-related degree compared to 61% of developers in general.

Data scientists are one of the most gender-balanced developer roles, but there is still a long way to go.

  • There was a big gender gap in all developer roles. However, data scientists as well as front-end developers had the best ratio of women to men, which was 1 to 12, almost 3 times better than DevOps specialists.

Data scientists are paid more than other developer roles.

  • In absolute terms, the median global annual salary of data scientists is $66k while the median for all developers is $58.5k. Overall, data scientists were one of the top-earning roles, after SREs, DevOps, and data engineers.
  • Among developers with the least average professional experience, data scientists were the highest-paid. Data scientist was also the second-highest paying role over all the different experience levels

Salary correlations

  • The top correlated feature with salary was the country of residence, where countries like the USA, Switzerland, UK, Canada, Australia had a positive correlation
  • There was a positive correlation between salary and having a Ph.D., as well as working for a company with more than 10k employees or being a freelancer.
  • When a data scientist role was combined with that of an academic researcher, there was a negative correlation.

Thank you for reading!

Feel free to reach out on Linkedin or check out the code on Github.

Note: The analysis was conducted as part of the Udacity’s Data Science Nanodegree Program.

--

--